Picture for Pengfei Liu

Pengfei Liu

ProjDevBench: Benchmarking AI Coding Agents on End-to-End Project Development

Add code
Feb 02, 2026
Viaarxiv icon

daVinci-Agency: Unlocking Long-Horizon Agency Data-Efficiently

Add code
Feb 02, 2026
Viaarxiv icon

What Does Vision Tool-Use Reinforcement Learning Really Learn? Disentangling Tool-Induced and Intrinsic Effects for Crop-and-Zoom

Add code
Feb 01, 2026
Viaarxiv icon

daVinci-Dev: Agent-native Mid-training for Software Engineering

Add code
Jan 27, 2026
Viaarxiv icon

AgencyBench: Benchmarking the Frontiers of Autonomous Agents in 1M-Token Real-World Contexts

Add code
Jan 16, 2026
Viaarxiv icon

One Sample to Rule Them All: Extreme Data Efficiency in RL Scaling

Add code
Jan 06, 2026
Viaarxiv icon

LiveTalk: Real-Time Multimodal Interactive Video Diffusion via Improved On-Policy Distillation

Add code
Dec 29, 2025
Viaarxiv icon

GeoVista: Web-Augmented Agentic Visual Reasoning for Geolocalization

Add code
Nov 19, 2025
Viaarxiv icon

Interaction as Intelligence Part II: Asynchronous Human-Agent Rollout for Long-Horizon Task Training

Add code
Nov 03, 2025
Viaarxiv icon

InnovatorBench: Evaluating Agents' Ability to Conduct Innovative LLM Research

Add code
Nov 03, 2025
Figure 1 for InnovatorBench: Evaluating Agents' Ability to Conduct Innovative LLM Research
Figure 2 for InnovatorBench: Evaluating Agents' Ability to Conduct Innovative LLM Research
Figure 3 for InnovatorBench: Evaluating Agents' Ability to Conduct Innovative LLM Research
Figure 4 for InnovatorBench: Evaluating Agents' Ability to Conduct Innovative LLM Research
Viaarxiv icon